Spotify Exploratory Dataset Analysis

Introduction

Code and Documentation

Version control tools:

  • git

  • github

Background

  • The data was aquired through Spotify API in 2020 by TidyTuesday

  • The class of the data frame

[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
  • The number of rows
[1] 32833

The data set

  • The variables (columns)
 [1] "track_id"                 "track_name"              
 [3] "track_artist"             "track_popularity"        
 [5] "track_album_id"           "track_album_name"        
 [7] "track_album_release_date" "playlist_name"           
 [9] "playlist_id"              "playlist_genre"          
[11] "playlist_subgenre"        "danceability"            
[13] "energy"                   "key"                     
[15] "loudness"                 "mode"                    
[17] "speechiness"              "acousticness"            
[19] "instrumentalness"         "liveness"                
[21] "valence"                  "tempo"                   
[23] "duration_ms"             

The different Genres

  • The main genres in the data

  edm latin   pop   r&b   rap  rock 
 6043  5155  5507  5431  5746  4951 

Cleaning: NA values

  • Check for the NA observation

    # A tibble: 5 × 4
      track_name track_artist track_album_name track_id              
      <chr>      <chr>        <chr>            <chr>                 
    1 <NA>       <NA>         <NA>             69gRFGOWY9OMpFJgFol1u0
    2 <NA>       <NA>         <NA>             5cjecvX0CmC9gK0Laf5EMQ
    3 <NA>       <NA>         <NA>             5TTzhRSWQS4Yu8xTgAuq6D
    4 <NA>       <NA>         <NA>             3VKFip3OdAvv4OfNTgFWeQ
    5 <NA>       <NA>         <NA>             69gRFGOWY9OMpFJgFol1u0
  • After investigating, these are unique songs

Cleaning: Duplicates

  • Songs that have the same name

    [1] 9383
  • Songs that have the same ID

    [1] 4477

Clean data

  • The new data with no duplicates

  edm latin   pop   r&b   rap  rock 
 4877  4137  5132  4504  5401  4305 

Exploratory Data Analysis

Initial Data Exploration

  • Most popular artists

Initial Data Exploration

  • Most popular albums

Initial Data Exploration

  • Most popular genres

Exploratory Data Analysis

How features change within genres

Latin

  • Latin stands out in danceability and valence.

  • danceability

Latin

  • Valance

Release Date

  • Is there a relationship between album release date and popularity?
  • Are features affected by album release date?

Conclusion

The analysis of the Spotify dataset yielded the following results:

  • Pop and Latin are the top most popular genres.

  • The higher the danceability/ valence, the more positively it correlates to the popularity.

  • Energy and loudness are positively correlated.

  • Energy and acoustics are negativelly correlated.